Abstract: Data availability is critical in distributed storage systems especially when node failures are prevalent in real life. A key requirement is to minimize the effort of recovering the lost or unavailable data of failed nodes. To ensure data availability, a storage system often introduces data redundancy via replication or erasure coding. The erasure coding is adopted in large scale storage system which achieves less redundancy overhead than normal replication under the same fault tolerance. Erasure Coded Storage system supports both single and concurrent failure recovery and aims to minimize the band width of recovering the failures. The performance of degraded reads is boosted by addressing both I/O parallelism and node heterogeneity. The temporarily unavailable data is quickly retrieved with the support of Raid Node, which retrieves the unavailable block from the surviving nodes. The performance is evaluated by Word Count Job to compare which computes the occurrences of each word in a document.

Keywords: Erasure-coded storage system, degraded reads, RAID Node.